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Abstract 

The  effectiveness  of  Newton’s  method  for  finding  an  unconstrained  mini- 
mizer  of  a  strictly  convex  twice  continuously  differentiable  function  has  prompted 
the  proposal  of  various  modified  Newton  methods  for  the  nonconvex  case. 

Linesearck  modified  Newton  methods  utilize  a  linear  combination  of  a  de¬ 
scent  direction  and  a  direction  of  negative  curvature.  If  these  directions  are 
sufficient  in  a  certain  sense,  and  a  suitable  linesearch  is  used,  the  resulting 
method  will  generate  limit  points  that  satisfy  the  second-order  necessary  con¬ 
ditions  for  optimality. 

We  propose  an  efficient  method  for  computing  a  descent  direction  and  a 
direction  of  negative  curvature  that  is  based  on  a  partial  Cholesky  factoriza¬ 
tion  of  the  Hessian.  This  factorization  not  only  gives  theoretically  satisfactory 
directions,  but  also  requires  only  a  partial  pivoting  strategy,  i.e. ,  the  equivalent 
of  only  two  rows  of  the  Schur  complement  need  be  examined  at  each  step. 

Keywords:  Unconstrained  minimization,  modified  Newton  method,  descent 
direction,  negative  curvature,  Cholesky  factorization 


‘Research  partially  supported  by  the  Goran  Gitstafsson  Foundation  and  the  Swedish  National 
Board  for  Technical  Development. 

^  Research  supported  by  the  Department  of  Energy  Contract  DE-FG03-92ER251 17,  the  Na¬ 
tional  Science  Foundation  Grants  DDM-9204208,  DDM-920-1547,  and  the  Office  of  Naval  Research 
Grant  N00014-90-J-1242. 

*This  paper  is  simultaneously  issued  as  Report  TRITA-M AT- 1993-9,  Department  of  Mathe¬ 
matics,  Royal  Institute  of  Technology;  Report  LMS  93-2,  Department,  of  Mathematics.  University 
of  California  at  San  Diego;  and  Report  SOL  93-1,  Department  of  Operations  Research,  Stanford 
University.  It  supersedes  part  of  Report  SOL  89-12  “A  modified  Newton  method  for  unconstrained 
minimization”,  Department  of  Operations  Research,  Stanford  University,  1989. 


2 


Partial  Cholesky  factorization 


1.  Introduction 

We  consider  the  unconstrained  minimization  of  a  twice  continuously  differentiable 
function  /  :  Sln  — *  IR.  If  /  is  strictly  convex,  the  excellent  local  convergence  proper¬ 
ties  of  Newton’s  method  make  it  one  of  the  most  effective  methods  for  minimization 
(see,  e.g.,  Ortega  and  Rheinboldt  [OR70]). 

In  the  non-convex  case,  various  modified  Newton  methods  have  been  proposed 
that  ensure  convergence  from  an  arbitrary  starting  point.  Here  we  focus  on  the 
class  of  linesearch  modified  Newton  methods  (for  a  complete  discussion  of  modified 
Newton  methods  and  their  relative  merits,  see,  e.g.,  Shultz  et  at.  [SSHSo],  Dennis 
and  Schnabel  (DS89|).  Linesearch  modified  Newton  methods  gene. ate  a  sequence 
{x*}*~o  of  improving  estimates  of  a  local  minimizer.  At  iteration  k,  a  linesearch  is 
performed  along  a  path  formed  from  a  linear  combination  of  two  directions  sk  and 
dk,  where  either  sk  or  dk  can  be  zero.  The  directions  st.  and  dk  arc  chosen  such 
that  gfsk  <  0  and  d%IItdt  <  0,  where  gk  and  II k  denote  the  gradient  V/  ( x )  and 
Hessian  V2/(.x)  evaluated  at  xk.  (Implicitly,  wc  also  assume  the  condition  g[dL  <  0. 
which  can  be  imposed  with  a  trivial  sign  change  of  dk.)  Each  nonzero  jq  satisfies 
gfsk  <  0  and  is  known  as  a  descent  direction.  Each  nonzero  <!k  satisfies  il[Jlk.dk  <  0 
and  is  known  as  a  direction  of  negative  curvature.  If  dk  is  nonzero,  IIk  must  have 
at  least  one  negative  eigenvalue.  (Henceforth  we  will  sacrifice  precision  for  the  sake 

of  brevity  and  refer  to  the  sequences  {s*}  and  {r/t}  as  sequences  of  “descent  di¬ 

rections”  and  “directions  of  negative  curvature”.)  Linesearch  methods  of  this  type 
have  been  proposed  by  Gill  and  Murray  [GM74],  Fletcher  and  Freeman  [FF77], 
McCormick  [McC77],  Mukai  and  Polak  [MP78],  Kaniel  and  Dax  [KD79],  and  Gohl- 
farb  [Gol80]. 

More  and  Sorensen  [MS79]  have  shown  that  if:  (i)  a  modified  Newton  method  is 
used  in  conjunction  with  a  suitable  linesearch;  and  (ii)  the  directions  sk  and  dk  are 
sufficient  in  the  sense  that  the  sequences  {.sq.}  and  {<7*}  are  bounded  and  satisfy 

0ksk  ->  0  =►  Ok  ->  9  and  —  0,  (11a) 

and 

0  =>  min{Amin(//t),0}  -*  0  and  dk  —  0,  (1.1b) 

then  every  limit  point  of  the  resulting  sequence  {xi.}^.0  will  satisfy  the  second-order 
necessary  conditions  for  optimality. 

It  has  been  observed  in  practice  that  the  number  of  iterates  at,  which  the  Hes¬ 
sian  is  positive  definite  is  large  compared  to  the  total  number  of  iterations.  Since 
linesearch  methods  revert  to  Newton’s  method  when  the  Hessian  is  sufficiently  pos¬ 
itive  definite,  it  would  seem  sensible  to  use  a  modified  Newton  method  based  on 
the  most  efficient  method  for  solving  a  symmetric  positive-definite  system.  This  is 
the  motivation  for  the  modified  Cholesky  factorization  proposed  by  Gill  and  Mur¬ 
ray  [GM74].  However,  it  has  been  shown  by  More  and  Sorensen  [MS79j  that  this 
factorization  may  not  give  directions  of  negative  curvature  that  are  sufficient  in  the 
sense  of  (1.1b).  This  paper  is  motivated  by  the  need  for  an  algorithm  with  the 
efficiency  and  simplicity  of  the  Cholesky  factorization,  but  with  the  guarantee  of 
convergence  when  used  with  a  suitable  linesearch.  It  is  shown  in  Section  3  that  a 
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partial  Cholesky  factorization  can  give  search  diiections  that  are  sufficient  in  the 
sense  of  (1.1). 

To  simplify  the  notation,  we  will  drop  the  subscript  k  when  referring  to  the 
quantities  gk,  //*,  sk  and  dk  at  a  specific  iteration.  Unless  otherwise  stated,  j|  •  || 
refers  to  the  vector  two-norm  or  its  induced  matrix  norm,  'flic  vector  r,  denotes 
the  j-th  unit  vector  whose  dimension  is  determined  by  the  context. 


2.  The  partial  Cholesky  factorization 


The  partial  Cholesky  factorization  of  H  is  a  variant  of  the  standard  Cholesky  fac¬ 
torization  with  diagonal  pivoting.  The  algorithm  is  stated  in  outer-product  form, 
where  the  Schur  complement  associated  with  the  unfactorized  part  of  //  is  up¬ 
dated  explicitly  at  each  step  (see,  e.g.,  Golub  and  Van  Loan  [GV89,  page  143]  and 
Higham  [Hig90]). 

At  each  step,  the  largest  diagonal  is  selected  as  pivot  and  is  used  to  eliminate  a 
row  and  column  from  the  Schur  complement.  The  algorithm  continues  until  either 
all  the  matrix  has  been  factorized  or  the  pivot  is  considered  unacceptable.  The  final 
factors  are  therefore  uniquely  determined  by  the  rule  used  to  accept  the  pivot  (i.c., 
the  rule  used  to  terminate  the  elimination).  Termination  is  controlled  by  a  preas¬ 
signed  scalar  parameter  u  (0  <  u  <  1).  A  pivot  is  acceptable  if  it  is  both  positive 
and  larger  in  absolute  value  than  v  times  the  off-diagonal  of  largest  magnitude  in 
the  pivot  row  and  column.  At  each  step,  the  determination  of  an  acceptable  pivot 
requires  the  examination  of  the  diagonals  and  a  single  row  of  the  Schur  comple¬ 
ment.  (For  a  similar  scheme  in  the  context  of  quadratic  programming,  see  Casas 
and  Pola  [CP90].) 

ft  will  be  shown  below  that  once  a  pivot  is  deemed  unacceptable  (and  hence 
the  factorization  is  terminated),  a  suitable  direction  of  negative  curvature  can  In' 
determined  from  the  elements  of  the  remaining  Schur  complement. 

Let  P  denote  the  permutation  matrix  representing  the  symmetric  interchanges 
performed  during  the  factorization.  If  denotes  the  number  of  steps  needed  before 
termination,  the  factorization  implicitly  identifies  a  leading  nx  x  «i  positive-definite 
submatrix  of  the  permuted  matrix  PTJIP.  In  terms  of  a  partition  7/n,  //)3,  Hn 
and  /f22  of  PTH  P,  we  have 


where  Lu  is  unit  lower  triangular  and  is  a  positive-definite  diagonal  matrix. 
The  submatrix  Hn  is  positive  definite,  and  If u  =  LuDiLf t  is  its  usual  Cholesky 
factorization  obtained  using  diagonal  pivoting.  The  factorization  may  be  written 
briefly  as  Ii  —  LDLT ,  where  L  is  a  row-permuted  lower-triangular  matrix  with 


L  =  P 


T\\ 

/y2 1 


and  D  — 


(2.2) 


We  will  use  n2  to  denote  the  size  of  //22,  so  that  A  “pscudo-mat-lab” 

version  of  the  partial  Cholesky  algorithm  is  given  in  Algorithm  '2.1. 
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Partial  Cholesky  factorization 


The  curvature  along  any  direction  d  computed  from  the  partial  Cholesky  factor¬ 
ization  is  related  to  the  magnitude  of  the  smallest  eigenvalue  of  the  Schur  comple¬ 
ment  B2.  The  following  lemma  relates  the  smallest  eigenvalue  of  B2  to  the  smallest 
eigenvalue  of  H , 

Lemma  2.1.  Let  H  be  a  symmetric  n  x  n  matrix  with  at  least  one  negative  eigen¬ 
value.  Let  the  partial  Cholesky  factorization  of  H  be  denoted  by  H  -  LBLT ,  where 
PTH  P  is  partitioned  as  in  (2.1).  Then 

and  B2  =  YT!IY, 


where 


Proof.  The  inequality  Amjn(7?2)  <  A mm(JI)  can  be  established  using  the  identity 


which  is  a  rearrangement  of  the  factorization  (2.1).  The  eigenvalues  of  II  and 
PrHP  are  identical.  Moreover,  the  positive-definiteness  of  Bi  implies  that  the 
second  term  on  the  right-hand-side  of  (2.4)  is  positive  semidefinite.  Since  the 
eigenvalues  of  PTH P  cannot  increase  on  subtraction  of  a  positive  semidefinite  ma¬ 
trix,  it  must  follow  that  min{0,  Amin(j?2)}  <  Amin(//)  (see  e.g.,  Golub  and  Van 
Loan  [GV89,  page  411]).  From  the  assumption  A mm{H)  <  0,  we  conclude  that 
^min  (B2)  Amjn(/f),  as  lequired. 

To  show  that  the  matrix  Y  (2.3)  is  well  defined,  it  is  sufficient  to  verify  that 
H^HU  =  !•  This  is  an  immediate  consequence  of  multiplying  the  partitioned 

right-hand-side  matrix  from  (2.1)  to  obtain  7/n  =  LnBxLfx  and  //12  =  LuB\LXx. 

Finally,  the  identity  YTHY  =  B2  may  be  verified  by  expressing  L~l II L~T  =  B 
in  the  partitioned  form 

(  Tn  \  /  //n  Hn  \  (  -L-JL £  \  (  Bx  \ 

V  I  )  \  *hi  Jh 2  A  1  )  \  )  ' 

from  which  the  result  follows.  | 

Note  that  the  matrix  5'  (2.3)  consists  of  the  last  n2  columns  of  L~7 .  Our  analysis 
requires  bounds  on  the  norms  of  Y ,  L  and  L-1,  which  are  provided  by  the  following 
lemma  given  by  Higham  [Hig90]. 

Lemma  2.2.  Let  II  be  factorized  using  the  partial  Cholesky  factorization  described 
in  Algorithm  2.1.  If  PTH P  is  partitioned  as  in  (2.1),  then 

(a)  \\L-jLll  <  iyi(n-ni)(4”.-l); 
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(b)  Hire'll  <  - 1); 

(<0  II ill  <  2; 

(d)  lli-'ll  <  22--. 

Proof,  Part  (a)  follows  immediately  from  Lemma  9.4  of  lligham  [Hig90]  and  the 
fact  that  the  elements  of  X21  are  bounded  in  absolute  value  by  \/u.  Part  (b)  is 
a  consequence  of  part  (a),  since  X21e,  is  an  n2-vector  whose  elements  arc  bounded 
in  absolute  value  by  1  jv.  Part  (c)  follows  from  the  fact  that  all  elements  of  L  are 
bounded  by  \/u  in  absolute  value.  Similarly,  part  (d)  is  a  consequence  of  the  fact 
that  all  elements  of  X-1  are  bounded  by  2 ni~x /v  (see  Higham  [Hig90]  for  details). 
I 


2.1.  Computation  of  the  descent  direction 


We  now  discuss  the  application  of  the  partial  Cholesky  factorization  to  the  calcu¬ 
lation  of  a  descent  direction  $k  satisfying  (1.1a).  Let  8  be  any  positive-definite 
modification  of  B,  i.e.,  8  is  a  positive-definite  matrix  with  \\B  -  B\\  “small”  and 
8  =  B  when  B  is  sufficiently  positive  definite.  There  are  many  choices  for  B — for 
example,  consider  the  block-diagonal  matrix  8  —  diag(Pj ,  /),  where  I  is  tire  identity 
matrix  of  order  n2.  With  this  definition,  when  nj  =  n  and  //  is  sufficiently  positive 
definite,  Bx  =  Bv  and  s  satisfies  the  usual  Newton  equations  IJs  -  -g. 

Lemma  2.3.  Let  H  be  factorized  using  the  partial  Cholesky  factorization  described 
in  Algorithm  2.1  and  assume  that  PTHP  is  jmrlitioned  ns  in  (2.1).  Let  B  be  a 
positive-definite  modification  of  B,  and  let  s  satisfy 


LBLts  =  -g. 


Then, 


-gTs  > 


illffll2  and  |M|  <  n  4 


2av,-1 


n2Am„(f?),MU  11  "" 

Proof.  From  the  definition  of  s  in  (2.5)  we  have 

s  =  -L~t8~x  L~lg. 


Wall 


(2.5) 


(2.6) 


Premultiplying  (2.6)  by  gT  gives 

-gTs  =  gTL~TB~l L~lg  > 


P!I2W1?) 


T 

9  9, 


and  the  required  lower  bound  on  ~gTs  follows  from  part  (c)  of  Lemma  2.2.  To 
obtain  the  bound  on  j|s||  we  derive  the  inequality  ||s||  <  AmM(5~1)||X_1||2||«/||,  by 
taking  norms  of  both  sides  of  (2.6),  substituting  for  L  from  (2.2)  and  using  norm 
inequalities.  The  required  upper  bound  follows  from  part  (d)  of  Lemma  2.2.  | 
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2.2.  Computation  of  the  direction  of  negative  curvature 

The  formula  for  d  is  derived  from  a  method  for  computing  directions  of  negative 
curvature  in  quadratic  programming  (see  Forsgren  et  al.  [FGM91]).  The  approach 
is  based  on  the  observation  that,  in  the  positive-definite  case,  the  Newton  direction 
is  a  minimizer  of  a  quadratic  model  with  gradient  g  and  Hessian  II .  In  particular, 
the  Newton  direction  can  be  found  by  a  quadratic  programming  algorithm  that 
minimizes  the  model  function  while  successively  releasing  variables  from  temporarily 
fixed  values.  This  analogy  can  be  extended  to  the  indefinite  case,  where  the  variables 
corresponding  to  /f2 2  are  temporarily  fixed  at  their  current  values,  and  a  direction 
of  negative  curvature  is  defined  by  releasing  either  one  or  two  of  the  fixed  variables. 
This  scheme  corresponds  to  using  a  direction  of  negative  curvature  that  is  a  multiple 
of  either  y;  or  y <  ±  y,,  where  1/,  and  xjj  denote  columns  i  and  j  of  the  matrix  Y  (2.3). 
The  following  lemma  shows  how  the  indices  i  and  j  are  determined  from  the  elements 
of  Bi  =  YTHY. 


Lemma  2.4.  On  termination  of  the  partial  Cholesky  factorization  with  diagonal 
pivoting,  let  PTIIP  be  partitioned  as  in  (2.1).  If  ri]  =  n,  define  A  =  0.  Otherwise, 
if  ni  <  n,  define  d  as  follows.  Given  p  =  max, |6<;-|  and  any  pair  of  indices 
q  (q  >  nx)  and  r  (r  >  nx)  such  that  |b,r|  =  p,  let  d  he  the  solution  of 


Proof.  If  rii  =  n,  then  Xmm(II)  >  0,  and  the  lemma  holds  from  the  definition  d  =  0. 
For  the  remainder  of  the  proof,  assume  that  nx  <  n. 

First,  it  is  necessary  to  show  that  7  <  op ,  where  7  =  max  {{max,  0}. 

If  the  factorization  terminates  with  7  =  0,  the  inequality  7  <  i/p  is  trivially  satis¬ 
fied.  If  the  factorization  terminates  with  7  >  0,  there  exists  an  index  t  ( t .  >  nt) 
such  that  b,t  —  7.  Since  7  must  be  an  unacceptable  pivot,  wc  can  infer  that 
7  <  i/maxj^ti,>r)1  |6j(|.  Consequently,  if  <  n,  it  must  hold  that  7  <  op. 

Let  d\  and  vx  denote  the  first  rii  components  of  Pd  and  v  respectively.  Similarly, 
let  d2  and  v2  denote  the  last  n2  components  of  Pd  and  v.  The  definitions  of  d  and 
v  imply  that  ||vi||  =  0,  ||u2||  =  1,  and  d2  =  sjpv-i .  Therefore, 

dTd  —  <f(dx  +  dfd2  >  pvjv2  -  p. 


(2.7) 
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Similarly,  the  definition  of  d  and  (2.2)  imply  that 

fd  <  (1  +  ||£:,ri>llJ)/>  <  (l  +  2(43„;  *j)  P.  (2-8) 

where  the  last  inequality  follows  from  Lemma  2.2.  Combining  (2.7)  and  (2.8)  yields 

P<<Fd<(^  +  -2-~ 377— )  P-  (2.9) 

Consider  the  case  p  —  0,  which  is  equivalent  to  II  being  positive  semidcfinite 
and  singular  with  Amin(tf )  =  0.  In  this  case,  (2.9)  implies  d  =  0,  as  required. 

Now  assume  that  p  >  0.  First,  if  q  =  r,  then  \bqq\  =  p.  Since  b qq  <7  <  i'p  <  p, 
it  must  hold  that  bqq  =  -p,  and  from  the  definition  of  d  we  obtain  the  bound 

dTIId  =  pbqq  <  -(1  -  u)p\  (2.10) 


Alternatively,  if  q  ^  r,  then  the  definition  of  d  yields 


drEd  =  £(&„  +  bTT  -  2|V|)  <  P(l  -  P)  <  "(I  -  (2.11) 

where  the  inequalities  follow  from  the  conditions  bqq  <  7,  6rr  <  7  and  p  >  7  ji> . 

Since  the  magnitude  of  every  element  in  B 2  is  bounded  by  p,  the  Gershgorin 
circle  theorem  and  Lemma  2.1  imply 

P  >  --Amin(52)  >  --A min(ff).  (2.12) 

Tlo  Tin 


Combining  (2.9),  (2.10),  (2.11)  and  (2.12)  we  obtain 

£Hd  3^(1  -u)  Zv'{l-v) 

dTd  ~  3t-2  +  2(4n>  -  l)P  -  n2(3i/2  +  2(4'“  -  1))  mnK  h 


(2.13) 


as  required. 

Since,  by  definition,  Amin(//)  <  dTIId/dTd,  the  left-most  inequality  of  (2.13)  gives 
an  upper  bound  on  p,  which  in  conjunction  with  (2.9)  and  (2.12)  give  the  bounds 
on  dTd  as 


- Amin(//)<riTd 

n2 


I 


This  lemma  gives  a  relation  between  the  curvature  along  d  and  the  smallest 
eigenvalue  of  II,  which  is  the  “best  possible”  curvature.  The  bound  is  exponential 
in  «i,  but  the  computational  experiments  discussed  below  imply  that  the  bound  is 
unlikely  to  be  tight  in  practice.  However,  as  in  Higham  [Hig90],  we  observe  that 
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there  do  exist  matrices  whose  bound  is  “almost”  tight.  For  given  n  (n  >  3)  and  0, 
define  L(9)  and  B{9)  as 


/  1 


m  = 


—  cos  9 

1 

-  cos  9 

—  cos  9 

1 

-  COS0 

—  cos  0 

-  cos  9 

1 

-  cos0 

—  cos  8 

—  cos  9 

-  cos  8 

1 

^  —  cos  9 

—  cos  8 

-  cos  8 

—  cos  8 

0 

■ 

sin2  8 

sin4  6 


\ 


1  / 

\ 


m  = 


\ 


sin2(n-3)  0 

0  -1 
-i 


Define  H(8)  =  L(0)B(8)L(0)T .  If  0  =  0,  it  is  shown  in  Lemma  A.l  of  Appendix  A 
that  Am;n(//(0))  =  4-  2n  -  7  -  n  +  1),  where 


-1  <  Amm(//(0))<  + 

n  +  1 

If  0  =  0,  the  partial  Cholesky  factorization  with  diagonal  pivoting  gives  n,  =  1. 
If  d{9)  denotes  the  direction  of  negative  curvature  associated  with  U{0).  we  obtain 


d(0)T//(0)d(0)  1 

d(0)Td(0)  ~  3’ 


and  d(0)  is  a  satisfactory  direction  of  negative  curvature.  However,  if  0  is  nonzero,  it 
follows  from  the  analysis  of  high  am  [HigDO]  that  the  partial  Cholesky  factorization 
with  diagonal  pivoting  will  define  L{6)  and  B{9)  as  factors  with  n i  =  n  -  2  for  all 
0/0.  Moreover, 

d{9)TII{9)d{8)  3 

d{9)Td{9)  ~  1  +  2-4*-*' 

and  for  9  near  zero,  the  curvature  along  d(0)  is  close  to  the  worst  possible  value 
predicted  by  Lemma  2.4  (see  Higharn  [Hig90]  for  the  details).  This  “pathological” 
example  arises  because  the  principal  submatrix  of  order  n  -  2  of  H(9)  is  positive 
definite  but  arbitrarily  close  to  being  singular  so  that  jl//,-,1  f/i2j|  (or  equivalently 
(ILfiT£2tll)  *s  very  large.  This  is  reflected  in  arbitrarily  small  pivot  elements. 

A  numerical  experiment  was  devised  to  investigate  if  the  bound  of  Lemma  2.4 
is  likely  to  be  sharp  for  an  arbitrary  indefinite  matrix.  Matlab  4.0  was  used  to  gen¬ 
erate  directions  of  negative  curvature  for  a  large  set  of  random  indefinite  symmetric 
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matrices  of  order  50.  Each  H  was  defined  as  QAQr,  with  Q  a  random  orthogonal 
matrix  and  A  a  random  diagonal  matrix  with  at  least  one  negative  element.  The 
matrix  Q  was  obtained  from  the  QR-factorization  of  a  50  x  50  matrix  whose  ele¬ 
ments  were  taken  from  an  independent  normal  distribution  with  zero  mean  and  unit 
variance.  The  elements  of  A  were  taken  from  an  independent  uniform  distribution  in 
the  interval  [—25,25].  Directions  of  negative  curvature  were  computed  with  ^-values 
v/f,  0.05,  0.10,  . . . ,  0.95,  and  1  -  \/l,  where  e  denotes  the  machine  precision.  A  new 
random  matrix  was  generated  for  each  factorization,  giving  a  total  of  1500  matrices 
for  each  value  of  v.  Figure  2.1  gives  the  outcome  of  the  computational  experiment. 
The  three  lines  depict  the  maximum,  mean,  and  minimum  values  of  the  ratio  r  of 
c FHd/dTd  to  Amin(/f).  Each  “+”  represents  the  value  of  r  for  a  particular  value  of 
the  parameter  u. 


T 


V 

Figure  2.1:  Curvature  ratio  r  as  function  of  v. 

The  bound  on  r  given  by  Lemma  2.4  is  approximately  maximized  for  u  =  2/3.  If, 
for  n  =  50,  this  optimal  value  gives  ni  =  49,  the  theoretical  bound  is  approximately 
7  x  10"3*.  This  should  be  compared  with  the  computed  values  of  r,  which  never 
fell  below  0.05  when  v  was  larger  than  0.5.  The  minimum  value  of  r  attained  a 
maximum  of  0.0809  for  u  =  0.9.  Based  on  these  results,  we  would  recommend  a 
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value  of  v  in  the  range  (0.5,  0.S5).  Note  that  the  larger  the  value  of  u,  the  smaller 
the  value  of  nt  and  consequently,  the  smaller  the  amount  of  computation. 


3.  Theoretical  results 

The  partial  Cholesky  factorization  can  be  used  as  the  basis  for  a  descent  method  for 
minimizing  a  twice-continuously  differentiable  function  /  :  Jlln  — »  III.  This  method 
defines  a  sequence  {xt}£l0  improving  estimates  of  a  local  minimizer. 

Let  x0  be  any  starting  point  such  that  the  level  set  {a  |  f(x)  <  f(x 0)}  is  compact. 
Let  {$*}  and  {dk}  be  bounded  sequences  such  that  each  sk  is  a  descent  direction 
that  satisfies  (1.1a)  and  each  dk  is  a  direction  of  negative  curvature  that  satisfies 
(1.1b).  More  and  Sorensen  [MS79]  show  that  with  an  appropriate  linesearch,  certain 
linear  combinations  of  sk  and  dk  define  xk+1  so  that  every  limit  point  of  {xjt}£L0 
will  satisfy  the  second-order  necessary  conditions  for  optimality — i.e.,  at  every  limit 
point  x,  V/(x)  is  zero  and  V2/(x)  is  positive  semidefinite.  The  main  result  of  this 
paper — that  the  search  directions  obtained  using  the  partial  Cholesky  factorization 
are  sufficient  in  the  sense  of  More  and  Sorensen  [MS79] — is  stated  in  the  following 
theorem. 

Theorem  3.1.  Let  {xt}^L0  be  a  sequence  of  iterates  contained  in  a  compact  reqion 
of  Rn,  and  assume  that  f  :  Hln  — >  IR,  is  a  twice- continuously  differentiable  function. 
For  each  k,  define  gk  -  V/(xt)  and  llk  =  V2/(xt),  and  let  Hk  —  LkDkLj  be  the 
partial  Cholesky  factorization  of  Hk  as  described  in  Algorithm  2.1.  Given  positive 
constants  Ci  and  c2  (rq  <  c2),  let  sk  be  defined  from  Lemma  2.3  with  the  additional 
requirement  that  Cj  <  Am,n(/fi. )  <  Amax(  If  )  <  c2.  Finally,  let  dL.  be  defined  from 
Lemma  2.f  Then,  {st}  and  {dk }  are  bounded  sequences  such  that 

gjsk  —  0  =>  gk  —  0  and  sk  —  0 

and 

dfllkdk  —*  0  =>  min  {Amin(//i),  0}  — »  0  and  dk  — <•  0. 

Proof.  Since  {xk}  lies  in  a  compact  region,  the  smoothness  of  /  implies  that  {Jjrq-jj} 
and  (|| //i||}  are  bounded. 

With  the  existence  of  C!  and  c2,  and  the  boundedness  of  ||fjrt||,  Lemma  2.3  implies 
that  {st}  is  a  bounded  sequence,  and  gjsk  — *  0  implies  gk  —*  0  and  sk  — *  0,  as 
required. 

Lemma  2.4  and  the  boundedness  of  |j //* f|  imply  that  {dk}  is  a  bounded  sequence, 
and  dfHkdk  — ►  0  implies  dk  — +  0  and  min{Am;n(//t), 0}  — ►  0,  as  required.  | 

If  V2/(xt)  is  sufficiently  positive  definite,  all  pivots  will  be  acceptable  and  the 
partial  Cholesky  factorization  will  terminate  with  ?ii  =  n.  This  implies  that  if 
{x*}*L0  has  a  limit  point  x  at  which  V2/(x)  is  sufficiently  positive  definite,  then  the 
iterates  will  be  identical  to  those  of  Newton’s  method  for  I;  sufficiently  large. 
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4.  Discussion 

The  partial  Cholesky  factorization  may  be  implemented  in  other  ways.  For  example, 
the  calculation  of  the  matrix  77 n  can  be  made  independent  of  the  calculation  of  the 
descent  direction  sk.  Once  a  direction  of  negative  curvature  has  been  defined,  a 
descent  direction  can  be  calculated  by  forming  the  modified  Cholesky  factorization 
of  B2  (see,  e.g.,  Gill  and  Murray  [GM74],  Schnabel  and  Eskow  [SE90]). 

The  algorithm  of  Section  2.2  requires  the  examination  of  the  diagonals  and  a  sin¬ 
gle  row  of  the  Schur  complement  at  each  step.  Alternative  strategies  can  be  devised 
in  which  the  complete  Schur  complement  is  examined  under  certain  exceptional  cir¬ 
cumstances.  For  example,  if  a  pivot  is  small,  the  pivot  acceptance  criterion  could  be 
strengthened  so  that  a  pivot  is  acceptable  if,  in  addition  to  the  requirements  of  Algo¬ 
rithm  2.1,  it  is  larger  in  absolute  value  than  isbmmx,  where  bmhx  is  either  the  diagonal 
of  largest  magnitude  in  the  Schur  complement  or  the  element  oflargest  magnitude 
in  the  full  Schur  complement.  Each  of  these  modifications  gives  an  algorithm  with 
identical  theoretical  properties,  but  a  potentially  smaller  value  of  t»j.  However,  this 
potential  improvement  is  at  the  expense  of  an  increase  in  the  number  of  compar¬ 
isons  during  the  factorization.  The  pivot  criterion  that  requires  the  examination  of 
the  full  Schur  comnlement  would  cope  successfully  with  the  “pathological”  77(0)  of 
Section  2.2  since  the  factorization  would  terminate  after  one  step  for  9  sufficiently 
small. 

5.  Summary 

We  have  shown  how  a  partial  Cholesky  factorization  can  be  used  to  define  search 
diiections  suitable  for  a  linescarch-based  modified  Newton  method.  The  resulting 
directions  are  sufficient  in  the  sense  that  it  is  possible  to  generate  a  sequence  {xi:}£i0 
with  limit  points  having  a  zero  gradient  and  a  positive-semidefinite  Ilcssian. 

To  our  knowledge,  this  is  the  first  triangular  factorization  that  not  only  gives 
theoretically  satisfactory  directions,  but  also  requires  only  a  parlial  pivoting  strat¬ 
egy,  i.e.,  the  equivalent  of  only  two  rows  of  the  Schur  complement  need  be  examined 
at  each  step. 
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A.  Eigenvalues  of  //( 0) 

Lemma  A.l.  Let  the  n  x  n-matrices  L{ 0)  and  D{ 0)  be  defined  as  in  Section  2  for 
9  =  0  and  n  >  3.  Define  7/(0)  =  7,(0)£?(0)L(0)T.  Then  X  =  -l(v/V +  2u  -  7  - 
n+  1)  is  the  smallest  eigenvalue  of  77(0),  and  -1  <  A  <  -1  -f-  4f\n  +  1). 
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Proof.  It  is  straightforward  to  verify  that 


(  1 
-1 
-1 


ff(0)  = 


V 


-i 

-i 

-i 


-i  -i 
i  i 
i  i 

i  i 
i  i 
i  i 


-i  -i  -i  \ 

i  i  i 

i  i  i 

i  i  i 

1  1  0 

10  1 


Since  U(0)  has  one  negative  eigenvalue  and  L{ 0)  is  nonsingular,  Sylvester’s  law 
of  inertia  implies  that  H{ 0)  has  one  negative  eigenvalue  (see  e.g.,  Golub  and  Van 
Loan  [GV89,  page  416]).  Consequently,  since  A  is  negative  for  n  >  3,  it  is  enough 
to  show  that  it  is  an  eigenvalue. 

Assume  that  r  =  (1  —1  —  1  •  •  •  —  1  a  a)T  is  an  eigenvector  of  II { 0)  for  some 
scalar  a.  Then,  if  v  is  an  eigenvector,  there  must  exist  a  A  such  that 


—  2  —  2a  =  A  and  (A. la) 

— 77 -f- 2  4-  a  —  Aa.  (A. lb) 


It  is  straightforward  to  show  that  for  n  >  3,  (A.l)  has  a  negative  solution  A  given 

i>y 

\Jn7  +  2tx  -  7  -  n  +  1  \fn-  +  2n  -  7  +  n-  3 

A  = - - -  and  a  = - . 

2  4 

The  upper  and  lower  bounds  on  A  follow  from  the  sequence  of  inequalities 


n  +  1  > 


•\/(n+l)2-8  =  (n+l) 


8 

(»  +  l  )2 


>  77  +  1  - 


8 

77  +  1  * 


(Note  that  the  lower  bound  can  also  be  obtained  directly  from  Lemma  2.1.)  | 
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Algorithm  2.1.  An  algorithm  for  the  partial  Cholesky  factorization 

%PARTCHOL  Partial  Cholesky  factorization  routine  for  a  real  symmetric 

%  matrix  H . 

%  [Z,f?, perm, raj  =  partchol(tf) 

%  forms  a  permutation  perm,  a  unit  lower- triangular  matrix 

%  L( perm,:)  and  a  block  diagonal  matrix  B  such  that  L-BL'—Il 

%  using  the  partial  Cholesky  factorization  with  diagonal  pivoting. 

%  The  size  of  the  positive-definite  principal  submatrix  obtained 

%  in  the  factorization  is  denoted  by  nx. 

function  [£,.£?, perm, ni]  =  partchol(//) 
n  =  length(/f); 
perm  =  l:n; 

B  =  H\ 

L  =  zeros(n); 
v  €  (0,1); 
k  —  1; 

=  0; 

while  k  <  n 

[p,r,T]  —  max([zeros(l,/j-l)  diag(jB(fc:n,fc:n))’]); 
if  k  <  n 

ppr  =  max(abs(B(r,[l:r-l  r+l:n]))); 

else 

Ppr  ~  0j 

end 

if  ^zr  >  0  and  pr  >  v  •  f.ipr 
nx  =  k; 

perm([fc  r])  =  perm([r  fc]); 

B([k  r],:)  =  B([r  &],:); 

B(4k  r])  =  *]); 

L(pevm(k:n),k)  =  B(k:n,k)/ B(k,k)\ 
if  k  <  n 

B(k+l:n,k+l:n)  =  B(k+ l:n,k+l:n)-L(perm(k+\:n),k)-B(k,k+ 1:«); 
B(k+l:n,k )  =  zeros(n-fc,l); 

B(k,k+l:n)  =  zeros(l,n-fc); 

end 

k  =  k+l; 

else 

L(pevm(k:n),k:n)  =  eye(n-/c+l); 
k  =  n+1; 

end 

end 
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17.  SICURlTY  CLASSIFICATION 

OF  RtRORT 

UNCLASSIFIED 


NSf.  75*10 -0 1  - 200- 5  500 


11.  SICURlTY  CLASSIFICATION 
OF  THIS  RAG! 


II  SICURlTY  CLASSIFICATION 

OF  AISTRACT 


IS.  NUMKR  OF  RAGIS 

14  pp 


11.  RRlCI  COD! 


20.  (.IMITATION  OF  AISTRACT 


ltA'eR'0  *0'”  . 


